173 research outputs found
Solving stable matching problems using answer set programming
Since the introduction of the stable marriage problem (SMP) by Gale and
Shapley (1962), several variants and extensions have been investigated. While
this variety is useful to widen the application potential, each variant
requires a new algorithm for finding the stable matchings. To address this
issue, we propose an encoding of the SMP using answer set programming (ASP),
which can straightforwardly be adapted and extended to suit the needs of
specific applications. The use of ASP also means that we can take advantage of
highly efficient off-the-shelf solvers. To illustrate the flexibility of our
approach, we show how our ASP encoding naturally allows us to select optimal
stable matchings, i.e. matchings that are optimal according to some
user-specified criterion. To the best of our knowledge, our encoding offers the
first exact implementation to find sex-equal, minimum regret, egalitarian or
maximum cardinality stable matchings for SMP instances in which individuals may
designate unacceptable partners and ties between preferences are allowed.
This paper is under consideration in Theory and Practice of Logic Programming
(TPLP).Comment: Under consideration in Theory and Practice of Logic Programming
(TPLP). arXiv admin note: substantial text overlap with arXiv:1302.725
Dynamic Weights in Multi-Objective Deep Reinforcement Learning
Many real-world decision problems are characterized by multiple conflicting
objectives which must be balanced based on their relative importance. In the
dynamic weights setting the relative importance changes over time and
specialized algorithms that deal with such change, such as a tabular
Reinforcement Learning (RL) algorithm by Natarajan and Tadepalli (2005), are
required. However, this earlier work is not feasible for RL settings that
necessitate the use of function approximators. We generalize across weight
changes and high-dimensional inputs by proposing a multi-objective Q-network
whose outputs are conditioned on the relative importance of objectives and we
introduce Diverse Experience Replay (DER) to counter the inherent
non-stationarity of the Dynamic Weights setting. We perform an extensive
experimental evaluation and compare our methods to adapted algorithms from Deep
Multi-Task/Multi-Objective Reinforcement Learning and show that our proposed
network in combination with DER dominates these adapted algorithms across
weight change scenarios and problem domains
Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making
In multi-objective decision planning and learning, much attention is paid to
producing optimal solution sets that contain an optimal policy for every
possible user preference profile. We argue that the step that follows, i.e,
determining which policy to execute by maximising the user's intrinsic utility
function over this (possibly infinite) set, is under-studied. This paper aims
to fill this gap. We build on previous work on Gaussian processes and pairwise
comparisons for preference modelling, extend it to the multi-objective decision
support scenario, and propose new ordered preference elicitation strategies
based on ranking and clustering. Our main contribution is an in-depth
evaluation of these strategies using computer and human-based experiments. We
show that our proposed elicitation strategies outperform the currently used
pairwise methods, and found that users prefer ranking most. Our experiments
further show that utilising monotonicity information in GPs by using a linear
prior mean at the start and virtual comparisons to the nadir and ideal points,
increases performance. We demonstrate our decision support framework in a
real-world study on traffic regulation, conducted with the city of Amsterdam.Comment: AAMAS 2018, Source code at
https://github.com/lmzintgraf/gp_pref_elici
Dealing with Expert Bias in Collective Decision-Making
Quite some real-world problems can be formulated as decision-making problems
wherein one must repeatedly make an appropriate choice from a set of
alternatives. Expert judgements, whether human or artificial, can help in
taking correct decisions, especially when exploration of alternative solutions
is costly. As expert opinions might deviate, the problem of finding the right
alternative can be approached as a collective decision making problem (CDM).
Current state-of-the-art approaches to solve CDM are limited by the quality of
the best expert in the group, and perform poorly if experts are not qualified
or if they are overly biased, thus potentially derailing the decision-making
process. In this paper, we propose a new algorithmic approach based on
contextual multi-armed bandit problems (CMAB) to identify and counteract such
biased expertises. We explore homogeneous, heterogeneous and polarised expert
groups and show that this approach is able to effectively exploit the
collective expertise, irrespective of whether the provided advice is directly
conducive to good performance, outperforming state-of-the-art methods,
especially when the quality of the provided expertise degrades. Our novel
CMAB-inspired approach achieves a higher final performance and does so while
converging more rapidly than previous adaptive algorithms, especially when
heterogeneous expertise is readily available
Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence
Multi-objective problems with correlated objectives are a class of problems that deserve specific attention. In contrast to typical multi-objective problems, they do not require the identification of trade-offs between the objectives, as (near-) optimal solutions for any objective are (near-) optimal for every objective. Intelligently combining the feedback from these objectives, instead of only looking at a single one, can improve optimization. This class of problems is very relevant in reinforcement learning, as any single-objective reinforcement learning problem can be framed as such a multi-objective problem using multiple reward shaping functions. After discussing this problem class, we propose a solution technique for such reinforcement learning problems, called adaptive objective selection. This technique makes a temporal difference learner estimate the Q-function for each objective in parallel, and introduces a way of measuring confidence in these estimates. This confidence metric is then used to choose which objective's estimates to use for action selection. We show significant improvements in performance over other plausible techniques on two problem domains. Finally, we provide an intuitive analysis of the technique's decisions, yielding insights into the nature of the problems being solved
- …